A fully autonomous AI calling agent that connects directly to a local FritzBox SIP server. No Twilio, no cloud VoIP, no per-minute billing. Custom voice AI runs locally, handles real conversations with prospects, manages call logic, qualification, and retries — all entirely in-house.
The standard approach for AI calling systems relies on cloud VoIP providers like Twilio — which works, but at scale, the per-minute costs become a significant line item. This project takes a different path: direct SIP registration on a local FritzBox router, cutting out the cloud middleman entirely.
The result is a system that runs fully on-premises on commodity hardware, scales to 100+ concurrent calls, and costs nothing per minute to operate. Every component — from the telephony layer to the AI conversation engine — is owned and controlled in-house.
The engine registers as a SIP client directly on the FritzBox. Outgoing calls are initiated through the SIP protocol, with audio streams handled via WebRTC. The voice AI processes audio in real-time, generating contextually appropriate responses while managing call state and qualification logic.
qMaintaining persistent SIP registration on consumer-grade Fritz!Box hardware required custom keepalive logic and graceful re-registration handling on network drops.
Streaming live audio from SIP calls to the AI model with sub-200ms latency. Required custom RTP packet handling and efficient audio buffering.
Handling 100+ simultaneous calls on a single server. Solved through non-blocking I/O patterns and isolating each call into its own managed context.
Making the AI sound natural and handle interruptions, pauses and unexpected responses. Custom state machine manages conversation context across turns.